Senior SRE Reliability Engineer | Banking

Location: Quarry Bay
Type: Permanent
Post Date: Wed Jun 10 04:55:22 2026
Ref: BBBH59741

Our Investment Bank client is looking for a Senior SRE role focused on monitoring, Kubernetes reliability, and observability to ensure resilient, scalable, high‑performing platforms to join their infrastructure team.

Key Responsibilities

Lead reliability and observability across platforms, ensuring high availability and performance
Design, implement, and enhance monitoring solutions using tools such as Prometheus, Grafana, and Elasticsearch
Develop alerting strategies, dashboards, and end-to-end observability pipelines
Diagnose complex production incidents through log analysis, troubleshooting, and root cause investigation
Manage and optimize Kubernetes environments, including health checks, scaling, and workload stability
Administer Linux systems (RHEL), covering upgrades, patching, and performance tuning
Collaborate with engineering, infrastructure, and application teams to strengthen system resilience and scalability
Maintain logging pipelines, including ingestion, parsing, and routing into search/analytics platforms
Continuously evaluate and adopt modern SRE tools, practices, and automation approaches
Participate in on-call rotations for production support, including off-hours coverage

Key Requirements

Degree in Computer Science, Engineering, or related field
8–10 years’ experience in SRE, platform engineering, or production support environments
Strong hands-on expertise in monitoring and observability tools (e.g., Prometheus, Grafana, Elasticsearch, Kibana)
Proven experience building metrics pipelines, exporters, and integrations with long-term storage systems
Solid experience with automation and scripting (Python, Bash, Ansible, CI/CD pipelines)
Experience managing log processing pipelines (e.g., ingestion, filtering, enrichment)
Proficient in designing dashboards and analytics for distributed systems
Strong Linux administration knowledge, including troubleshooting and system optimization
Hands-on Kubernetes experience (operations, orchestration, scaling, and troubleshooting)
Understanding of SRE principles, incident management, high availability, and disaster recovery
Knowledge of networking concepts and distributed system performance tuning
Exposure to GPU-based or AI/ML infrastructure is advantageous
Self-driven, adaptable, and capable of handling multiple priorities in a fast-paced environment
Fluent in English; Cantonese and Mandarin language skills are a plus

“Sanderson-iKas” is the brand name for the following companies incorporated in Hong Kong: Sanderson Solutions International (Hong Kong) Limited (Business Registration no.53741924) and iKas International (Asia) Limited (Business Registration no.39818987)

Website: www.sanderson-ikas.hk

Apply Now